Alternative Relative Discrimination Criterion Feature Ranking Technique for Text Classification

نویسندگان

چکیده

The use of text data with high dimensionality affects classifier performance. Therefore, efficient feature selection (FS) is necessary to reduce dimensionality. In classification challenges, FS algorithms based on a ranking approach are employed improve the To rank terms, most algorithms, such as Relative Discrimination Criterion (RDC) and Improved (IRDC), document frequency (DF) term (TF). TF accepts actual values frequently rarely occurring terms used in existing algorithms. However, these focus number rather than category. this research, an alternative method RDC, called Alternative (ARDC) was proposed, which aims accuracy effectiveness RDC ranking. Specifically, ARDC designed identify commonly positive class. results obtained were compared methods, IRDC, standard benchmarking functions Information Gain (IG), Pearson Correlation Coefficient (PCC), ReliefF. experimental reveal that using suggested Reuters21578, 20newsgroup, TDT2 datasets provides better performance precision, recall, f-measure, when employing well-known classifiers multinomial naïve Bayes (MNB), Support Vector Machine (SVM), Multilayer perceptron (MLP), k-nearest neighbor (KNN), decision tree (DT). Another experiment performed validate proposed technique, showcase novelty approach. utilized 20newsgroup dataset Relevant-Based Feature Ranking (RBFR) technique. Naïve (NB), Random Forest (RF) Logistic Regression (LR) demonstrate ARDC.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Feature Selection Technique for Text Document Classification: An Alternative Approach

Text classification and feature selection plays an important role for correctly identifying the documents into particular category, due to the explosive growth of the textual information from the electronic digital documents as well as world wide web. In the text mining present challenge is to select important or relevant feature from large and vast amount of features in the data set. The aim o...

متن کامل

Accuracy Based Feature Ranking Metric for Multi-Label Text Classification

In many application domains, such as machine learning, scene and video classification, data mining, medical diagnosis and machine vision, instances belong to more than one categories. Feature selection in single label text classification is used to reduce the dimensionality of datasets by filtering out irrelevant and redundant features. The process of dimensionality reduction in multi-label cla...

متن کامل

An Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification

The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...

متن کامل

An Improved Flower Pollination Algorithm with AdaBoost Algorithm for Feature Selection in Text Documents Classification

In recent years, production of text documents has seen an exponential growth, which is the reason why their proper classification seems necessary for better access. One of the main problems of classifying text documents is working in high-dimensional feature space. Feature Selection (FS) is one of the ways to reduce the number of text attributes. So, working with a great bulk of the feature spa...

متن کامل

Category Discrimination Based Feature Selection Algorithm in Chinese Text Classification

How to improve the classification precision is a major issue in the field of Chinese text classification. The tf-idf algorithm is a classic and widely-used feature selection algorithm based on VSM. But the traditional tf-idf algorithm neglects the feature term’s distribution inside category and among categories, which causes many unreasonable selective results. This paper makes an improvement t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Access

سال: 2023

ISSN: ['2169-3536']

DOI: https://doi.org/10.1109/access.2023.3294563